This contribution presents a deep learning method for the extraction and fusion of information relating to kidney stone fragments acquired from different viewpoints of the endoscope. Surface and section fragment images are jointly used during the training of the classifier to improve the discrimination power of the features by adding attention layers at the end of each convolutional block. This approach is specifically designed to mimic the morpho-constitutional analysis performed in ex-vivo by biologists to visually identify kidney stones by inspecting both views. The addition of attention mechanisms to the backbone improved the results of single view extraction backbones by 4% on average. Moreover, in comparison to the state-of-the-art, the fusion of the deep features improved the overall results up to 11% in terms of kidney stone classification accuracy.
translated by 谷歌翻译
内窥镜检查是空心器官内最广泛使用的癌症和息肉检测的医疗技术。但是,由于启蒙源方向,内窥镜获得的图像经常受到照明人工制品的影响。当内窥镜的光源姿势突然变化时,存在两个主要问题:产生过度曝光和不受欢迎的组织区域。这两种情况可能导致因影响区域缺乏信息而导致误诊,或者在非侵入性检查过程中使用了各种计算机视觉方法的性能(例如,大满贯,运动结构,光流,光流)。这项工作的目的是两倍:i)引入一种由生成对抗技术生成的新合成生成的数据集和ii),并探索在过度暴露和未渗透的照明中探索基于浅层和深度学习的基于浅的基于学习的图像增强方法条件。除了在7.6 fps左右的运行时间外,还通过基于深网的LMSPEC方法获得了最佳定量结果(即基于公制的结果)
translated by 谷歌翻译
在此贡献中,我们使用一种合奏深度学习方法来组合两个单个单阶段探测器(即Yolov4和Yolact)的预测,目的是检测内窥镜图像中的伪像。这种整体策略使我们能够改善各个模型的鲁棒性,而无需损害其实时计算功能。我们通过训练和测试两个单独的模型和各种集合配置在“内窥镜伪影检测挑战”数据集中证明了方法的有效性。广泛的实验表明,在平均平均精度方面,合奏方法比单个模型和以前的作品的优越性。
translated by 谷歌翻译
在结肠息肉是众所周知的如通过结肠镜检查鉴定的癌症的前体或者有关诊断工作为症状,结肠直肠癌筛查或某些疾病的系统的监视。虽然大部分息肉是良性的,在数量,尺寸和息肉的表面结构是紧密相连的结肠癌的风险。有高的漏检率和不完全去除结肠息肉的存在由于可变性质,困难描绘异常,高复发率和结肠的解剖外形。过去,多种方法已建成自动化息肉检测与分割。然而,大多数方法的关键问题是,他们没有经过严格的大型多中心的专用数据集进行测试。因此,这些方法可能无法推广到不同人群的数据集,因为他们过度拟合到一个特定的人口和内镜监控。在这个意义上,我们已经从整合超过300名患者6个不同的中心策划的数据集。所述数据集包括与由六名高级肠胃验证息肉边界的精确划定3446个注释息肉标签单帧和序列数据。据我们所知,这是由一组计算科学家和专家肠胃的策划最全面的检测和像素级的细分数据集。此数据集已在起源的Endocv2021挑战旨在息肉检测与分割处理可推广的一部分。在本文中,我们提供全面的洞察数据结构和注释策略,标注的质量保证和技术验证我们的扩展EndoCV2021数据集,我们称之为PolypGen。
translated by 谷歌翻译
Robotic teleoperation is a key technology for a wide variety of applications. It allows sending robots instead of humans in remote, possibly dangerous locations while still using the human brain with its enormous knowledge and creativity, especially for solving unexpected problems. A main challenge in teleoperation consists of providing enough feedback to the human operator for situation awareness and thus create full immersion, as well as offering the operator suitable control interfaces to achieve efficient and robust task fulfillment. We present a bimanual telemanipulation system consisting of an anthropomorphic avatar robot and an operator station providing force and haptic feedback to the human operator. The avatar arms are controlled in Cartesian space with a direct mapping of the operator movements. The measured forces and torques on the avatar side are haptically displayed to the operator. We developed a predictive avatar model for limit avoidance which runs on the operator side, ensuring low latency. The system was successfully evaluated during the ANA Avatar XPRIZE competition semifinals. In addition, we performed in lab experiments and carried out a small user study with mostly untrained operators.
translated by 谷歌翻译
The purpose of this work was to tackle practical issues which arise when using a tendon-driven robotic manipulator with a long, passive, flexible proximal section in medical applications. A separable robot which overcomes difficulties in actuation and sterilization is introduced, in which the body containing the electronics is reusable and the remainder is disposable. A control input which resolves the redundancy in the kinematics and a physical interpretation of this redundancy are provided. The effect of a static change in the proximal section angle on bending angle error was explored under four testing conditions for a sinusoidal input. Bending angle error increased for increasing proximal section angle for all testing conditions with an average error reduction of 41.48% for retension, 4.28% for hysteresis, and 52.35% for re-tension + hysteresis compensation relative to the baseline case. Two major sources of error in tracking the bending angle were identified: time delay from hysteresis and DC offset from the proximal section angle. Examination of these error sources revealed that the simple hysteresis compensation was most effective for removing time delay and re-tension compensation for removing DC offset, which was the primary source of increasing error. The re-tension compensation was also tested for dynamic changes in the proximal section and reduced error in the final configuration of the tip by 89.14% relative to the baseline case.
translated by 谷歌翻译
Learning enabled autonomous systems provide increased capabilities compared to traditional systems. However, the complexity of and probabilistic nature in the underlying methods enabling such capabilities present challenges for current systems engineering processes for assurance, and test, evaluation, verification, and validation (TEVV). This paper provides a preliminary attempt to map recently developed technical approaches in the assurance and TEVV of learning enabled autonomous systems (LEAS) literature to a traditional systems engineering v-model. This mapping categorizes such techniques into three main approaches: development, acquisition, and sustainment. We review the latest techniques to develop safe, reliable, and resilient learning enabled autonomous systems, without recommending radical and impractical changes to existing systems engineering processes. By performing this mapping, we seek to assist acquisition professionals by (i) informing comprehensive test and evaluation planning, and (ii) objectively communicating risk to leaders.
translated by 谷歌翻译
In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem -- computing an optimal policy given a reward function -- in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information.
translated by 谷歌翻译
Speech-driven 3D facial animation has been widely explored, with applications in gaming, character animation, virtual reality, and telepresence systems. State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor, thus, resulting in unrealistic and inaccurate lip movements. To address this, we present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. Specifically, we train a style-agnostic transformer on a large facial expression dataset which we use as a prior for audio-driven facial expressions. Based on this prior, we optimize for identity-specific speaking style based on a short reference video. To train the prior, we introduce a novel loss function based on detected bilabial consonants to ensure plausible lip closures and consequently improve the realism of the generated expressions. Through detailed experiments and a user study, we show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
translated by 谷歌翻译
We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster. We show that our definition is closely related to popular clustering measures, namely normalized associations, which is a dual of the normalized cut objective, and normalized modularity. We give a linear time constant-approximate algorithm for our objective, which implies the first constant-factor approximation algorithms for normalized modularity and normalized associations.
translated by 谷歌翻译